FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8)#224
FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8)#224kostandinang wants to merge 47 commits into
Conversation
PR SummaryMedium Risk Overview Cook execution and sandboxes: Per-slice git worktrees are provisioned lazily on first use ( Git artifacts: Run/slice refs move from Live CLI: The event bus gains Defaults and misc: Architect/cook models default to Opus ( Reviewed by Cursor Bugbot for commit 21e1653. Bugbot is set up for automated code reviews on this repo. Configure here. |
90bb5ef to
40a9d88
Compare
5b145b2 to
7d39fe7
Compare
a5bfc10 to
ac4e47c
Compare
40a9d88 to
05b471a
Compare
05b471a to
89e7850
Compare
ac4e47c to
3e02bfb
Compare
Each cook agent action (write-tests, write-code, verify-epic) runs under a per-action wall-clock budget enforced in pi-actions.ts. Raise the default from 300s to 600s so Sonnet agents have headroom on larger slices and on brownfield repos where setup/discovery eats into the turn. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…eam, clean failures Iterating on the live TUI from real-terminal feedback: - One global run timer in the footer instead of a per-item clock on every pending row (and whole-second, no jittery decimals). - "brunch" wordmark is now a big lowercase figlet (Slant) in a warm orange gradient, replacing the egg. - Activity log + wordmark stream through Ink <Static> so the full run lands in scrollback instead of collapsing in a redrawn bounded box; line cap removed. - Brigade tracker no longer lights "taste" mid-cook — per-slice verify actions fire during cooking, so taste stays unlit until a real end-of-cook signal. - Failures throw instead of process.exit, so withCookBus disposes (unmounts Ink) before the error prints — no more frozen "prep ◐" hang. cook validates args before mounting the TUI and rejects unknown flags (e.g. --spec-id). check + presenter/cook/pi-actions tests green; full build deferred (active graphite stack navigation). Co-Authored-By: Claude <noreply@anthropic.com>
Wires the two remaining kitchen-brigade phases faithfully to the orchestrator-arcs mapping (verify→taste, ship→serve): - taste lights on the epic-verification verdict (action `epic <id> → …`), not on per-slice `verify <target>` lines — those fire mid-cook and previously lit taste while still cooking. - serve lights on a new `cook-done` event emitted at the end of runCook (after promotion); a halted run never ships, so it never lights serve. phase.test covers both signals + the full prep→cook→taste→plate→serve walk; check + presenter/cook tests green. Co-Authored-By: Claude <noreply@anthropic.com>
Each cook pi session was a black box in the pending panel — just a KB count.
runPi already subscribes to the session's text stream; instead of bytes,
surface the agent's latest non-empty line (tail-truncated, throttled every
2 KB) as the activity-progress detail, so a wait reads as live work ("agent
writing tests · …adds the RefreshToken guard") rather than "still going".
Kept headless createAgentSession — no pi InteractiveMode, no new pi API: pi's
tool-call events come via an extension hook (on('tool_call')), not the
subscribe stream, so a richer "editing <file> / running <tool>" heartbeat is a
separate follow-up that needs the extension-registration path verified.
check + pi-actions/presenter tests green.
Co-Authored-By: Claude <noreply@anthropic.com>
Richer "what the agent is doing" in the pending panel (the spike's Option A, full tier): instead of only the agent's latest line, show the tool calls — "edit src/auth/token.ts", "bash bun test", "grep RefreshToken". pi exposes no tool-call hook on session.subscribe (text/lifecycle only), so buildSessionOptions now supplies the built-in tools itself via customTools + noTools:'builtin': each createXToolDefinition(cwd) is wrapped to emit a label from its params, then delegates unchanged. The builders bake in the real config (withFileMutationQueue, truncation defaults), so behavior is preserved — confirmed in pi's edit.js. Observation is fail-safe (emit in try/catch). toolLabel + instrumentToolDefinition are pure/unit-tested (label mapping; wrap delegates same args + result; observer error can't break a tool call). Caveat: the customTools/noTools runtime wiring isn't covered by tests (they stub createSession, bypassing buildSessionOptions) — needs a real cook run to confirm the agent receives the instrumented tools and they emit live. check + pi-actions tests green. Co-Authored-By: Claude <noreply@anthropic.com>
Brownfield cook provisioned every slice's git worktree eagerly in wireHandlers — N `git worktree add` + N recursive node_modules CoW copies paid synchronously at startup before any slice fired. - Move slice-worktree creation into resolveSliceCwd via idempotent ensureSliceWorktree, so a slice's worktree is materialized on first fire. A run touching 2 of 8 slices pays for 2 worktrees, not 8. Synchronous provisioning serializes concurrent fires on the JS thread, so parallel-policy worktree adds never overlap. - Symlink each slice's node_modules to the parent worktree's single copy instead of CoW-copying per slice (SHAREABLE_TOP_LEVEL_ENTRIES). walkFiles already skips symlinks, so the shared tree is never re-walked during dep seeding, merge, or promotion. Other gitignored dirs still copy per slice. Correctness-neutral: same worktrees/branches, just lazy; deps resolve through the symlink. npm run verify green; adds symlink + idempotency unit tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
`brunch serve --land` opt-in: after a brownfield cook promotes to cook/<runId>, merge that branch into the repo's checked-out branch as serve's final step. landCookBranch refuses on a dirty tree / detached HEAD and aborts on conflict — the cook branch is always left intact. Plain `cook` and default `serve` are unchanged, so FE-877's "never freelance into the working branch" invariant holds unless opted in. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The interviewer (interview.ts) and the three cook agent actions (write-tests / write-code / verify-epic in pi-actions.ts) move from claude-opus-4-6 to claude-opus-4-8. Secondary-chat and the plan-architect default are left on 4.6 intentionally. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
363cba6 to
91a281d
Compare
Opus 4.8 rejects thinking.type=enabled ("Use thinking.type.adaptive and
output_config.effort"). Swap the interviewer's enabled/budgetTokens block
for { type: 'adaptive' } + effort: 'medium'. Only interview.ts sets explicit
thinking provider options; secondary-chat and plan-architect don't.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The grid made the happy path legible; this does the unhappy path.
- events: slice + cook-done gain `reason?`.
- pi-actions: failed-slice emits carry a reason — evaluate-done maps failureKind
('infra error' / 'tests failed'); write-tests/write-code catches use
'test authoring failed' / 'code authoring failed'.
- cook-cli: cook-done carries result.reason on halt.
- run-store: slice rows store `reason`; RunState.haltReason set from
cook-done(ok:false), unset on completion.
- ink: a failed slice row shows its reason (red); a HaltSummary footer pins
"✗ halted · <reason>" (truncated) when the run halts.
Retry-attempt counts ("3/3") DEFERRED — retryCount lives on the petri-net
tokens, not the action context; that's a separate slice.
Verified in an isolated worktree off the TUI branch: check clean; presenter +
pi-actions + cook tests green (141). The one full-suite failure is
build-boundary.test (client production-chunk manifest) — unrelated to this
server-only change; a worktree/symlinked-node_modules artifact, re-verify on a
normal checkout.
Co-Authored-By: Claude <noreply@anthropic.com>
Completes the retry half of failure legibility — without net-token plumbing. A slice loops failed→running on retry, so the attempt count is derivable from the slice-event stream the store already folds: queued→running is attempt 1, a step change mid-run keeps it, failed→running bumps it. The grid shows "attempt N" only for N≥2 (no clutter on the first run). No engine/net changes: retryCount on the petri tokens isn't needed — the observable slice transitions carry the same information. check + presenter tests green (46). Co-Authored-By: Claude <noreply@anthropic.com>
The attempt count now reads "attempt 2/4" — n over the total attempts allowed (retry budget + 1). cook-cli emits maxRetries on run-shape; the store derives maxAttempts = maxRetries + 1; the grid shows it only once a slice has retried (n ≥ 2), and falls back to a bare "attempt n" when the budget is unknown. Denominator confirmed against the net: retryCount halts at `>= maxRetries` (starts 0), so a slice gets maxRetries + 1 total attempts. check + presenter tests green (49). Co-Authored-By: Claude <noreply@anthropic.com>
Opus 4.8 otherwise splits a single `ask_question` into several parallel partial calls (one option each, or leaked tool-call XML), which land as `output-error` parts. - interview.ts: set `disableParallelToolUse` so the model emits one tool call per step. - app.ts: `stripFailedToolParts` drops echoed `output-error` tool parts before `validateUIMessages`. They carry `rawInput` but no `input`, so the output-error schema rejects them and bricks the whole follow-up turn; the retried call is what matters, the dead attempt carries no history value. - app.test.ts: covers a follow-up turn whose history contains a failed attempt followed by a successful retry. verify green (check + test + build). 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Evaluate gate (phantom failure fix):
- Add `absent` TestFailureKind for vitest "No test files found" (nothing built
yet), distinct from a genuine `test` red ("Cannot find module" stays `test`).
- runVerification aggregate precedence infra > test > absent.
- evaluate-done no longer emits `failed` for the unbuilt greenfield gate — the
slice stays running and flows to write-tests as the SAME attempt, killing the
phantom n/max + ✗ NEEDS WORK on every clean slice.
- Strip `[agent-tail]` harness noise from captured runner output.
Promotion ref namespace + composer:
- run-refs.ts: centralize the run git-ref namespace (brunch/run/<id>,
brunch/slice/<id>/<sid>), replacing duplicated cook/ + cook-slice/ strings;
prune stale worktree registrations before `git worktree add`. On-disk
.brunch/cook/runs/ path unchanged (FE-743 compat).
- run-artifact.ts: fold per-slice branches into the run branch via
`git merge-tree` plumbing in dependency order — real conflicts surface
fail-closed, one merge node per slice, `squash` granularity retained. Pure
plumbing: no working tree / index / active branch touched. Not yet wired into
cook-cli (needs a live-run check of the dependency-seed interaction).
🍳 Built with brunch
Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
Cook runs write per-slice worktrees under .brunch/cook/runs/<id>/worktree/, which Vite's default file watcher picks up — every slice edit triggered a page reload and tsconfig-change full reload in the host dev server. - vite-client.ts: server.watch.ignored = ['**/.brunch/**']. Merged with Vite's built-in ignores (.git, node_modules, cacheDir), so run artifacts no longer drive HMR. 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
project-detect.test.ts had the "resolves the runner from workspace packages in a monorepo" describe block verbatim twice — remove the copy. 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 4 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f96b365. Configure here.
| const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src'); | ||
| expect(relocated.testCommand('src/x.test.ts')).toEqual(['npx', 'vitest', 'run', 'src/x.test.ts']); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Duplicated withTestDir test block
Low Severity
The same describe('withTestDir relocates test targets…') suite appears twice in project-profile.test.ts, with identical cases duplicated verbatim starting at line 168.
Reviewed by Cursor Bugbot for commit f96b365. Configure here.
Greenfield serves accepted --land but never landed: landing only runs in the brownfield (codebase) promotion path. Greenfield has no repo branch to land onto, so surface the no-op instead of silently swallowing the flag. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
write-tests and write-code emit a failed slice event before rethrowing on runPi failure; verify-epic did not, so epic-verification failures left the slice grid without a failed row or reason. Mirror the wrapper on the epic's representative slice. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
mergeSlicesIntoEpicSandbox builds __epic__ trees via walkFiles, which skips symlinks, so the parent's symlinked node_modules never reached the merged cwd. verify-epic and probes then ran without resolvable deps even though per-slice verification passed. Re-link the shareable entries after the merge. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
The `withTestDir relocates test targets…` describe block was copied verbatim a second time. Remove the duplicate. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Cook actions (write-tests/write-code/verify-epic) and the interview already run on claude-opus-4-8; the architect default lagged at claude-opus-4-6. Lift it to 4-8 so plan, cook, and interview paths share the latest Opus. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
… detail Three serve/plan presentation fixes the dogfood run surfaced: - Recipe banner: the header said "brunch plan" / "✓ plan" while the phase tracker right below it reads "recipe". Rename both to "recipe" so the planning surface speaks one brigade term (the CLI command stays `plan`; this is the phase label). - out folder: plan-start showed the launch root (e.g. the repo dir). Emit the spec plan dir (.brunch/cook/specs/<id>) — the exact place plan.yaml lands. - Verbose warnings: --verbose now appends a one-sentence plain-English account after each terse warning code (explainEmitterWarning / explainReconciliationWarning), so a reviewer needn't know the code vocabulary. Default verbosity keeps the terse line. Also fix a pre-existing typecheck break in run-artifact.test.ts (mode 'codebase' → 'greenfield'; PlanMode is greenfield|brownfield) that was reddening the gate. 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>



Stack Context
Stacks on FE-878 (#222). FE-883 — operational improvements under the FE-864 brownfield-orchestration umbrella (planning PR #212). No longer the single timeout tweak it started as — this branch collects the operational improvements that make
brunch serveusable on real brownfield runs.What?
DEFAULT_ARCHITECT_MODEL_ID) all runclaude-opus-4-8; the interviewer uses adaptive thinking + effort, and a fix stops 4.8 fragmentingask_questioninto failed tool parts.verify-epic), keep the grid current during verification, surface the failure reason + a pinned halt summary, show per-slice attempt count, and a "not-started" evaluate gate so a greenfield slice isn't painted ✗ before its tests exist.serve --land— merge the promoted cook branch into the active branch as the final serve step; greenfield runs (no repo branch to land onto) warn instead of silently ignoring the flag.run-artifact.ts/run-refs.ts) composes the completed run.__epic__tree re-links the parent's symlinkednode_modulessoverify-epicand probes have resolvable deps..brunchcook worktrees.Why?
The spec-23 run exposed serve reliability as a broad issue, not a single timeout: actions timed out on real repos, the architect's model default produced weaker fallback plans that amplified slice counts, and slice failures — especially writer and epic-verify aborts — vanished from the grid with no reason. This branch is the FE-864 umbrella's implementation seam for those concrete operational fixes.
Tests
Per-seam unit coverage for the changes above:
serve --landarg mapping + greenfield warning,verify-epicfailed-row emission, the epic-mergenode_modulesre-link, the architect default model id, the run-artifact composer, and the slice-grid presenter.